The United States economic environment is heavily reliant on the loan industry. Businesses use loans to start up and grow, individuals employ them to make large purchases or pay down other debt, and students depend on loans to finance attendence at institutions. Overall, loans are a key cog in what makes America the land of opportunity, and in turn the nation pays close attention to the health and regulations of the industry.
Prosper, a San Francisco-based lending marketplace, began business on February 15, 2006. Not just any loan boker, Prosper was the first ever peer to peer lending platform that connected borrowers and lenders. Through its unique format, borrowers can request loans anywhere from $2,000 to $35,000, and Prosper lists and matches them with investors to fund the loans.
Prosper received a cease and desist letter from the Securities and Exchange Commission on November 24, 2008. Their claim was that Prosper issued investments, though internally the company believed it was merely a marketplace that connected borrowers and lenders. In the end, Prosper registered with the SEC and resumed business in July 2009. During this time, they improved their underwriting department and began issuing Prosper Ratings for each loan and Prosper Scores for each borrower.
The loan set for this study comes from publicly available Prosper-published data ranging from their inception to the first quarter of 2014, after which the company began only making this information available to its investors. The set contains 113,937 loans with 82 variables describing their attributes. A data set this robust contains endless trends for potential insights, so a few specifically were chosen to cover in depth.
Credit score, Prosper score, APR, default rates, and date of loan issue are all important metrics on either side of the loan. Historically significant economic developments occurred during the time frame of the loan set, so those implications were also kept in mind.
The study started with what most people instinctively associate with any kind of borrowing: credit score. In their analysis of potential borrowers, Prosper pulls potential clients’ FICO credit scores. Slightly different than many credit scores, the FICO score has a range from 250 to 900.
The data set has an upper and lower range for the score that are uniformly separated by 20 points, so we will use the upper extensively for consistency.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 379.0 679.0 699.0 705.4 739.0 899.0
The above summary shows the inner quartile range between 679 and 739 with both mean and median around 700 for credit scores. The following histogram confirms the same.
Using the same x-axis scale, the plot of credit scores for loans after Prosper’s registration with the SEC in 2009 is below. Notice no loans are below 600. This is just one example of the effect of regulation by the SEC on Prosper.
With a feel for the types of credit scores that borrowers held, the next step is to see how some of the variables supplied by Prosper relate to credit scores.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.0000 0.0000 0.5902 0.0000 83.0000 46
A brief summary of the variable for current delinquencies shows that a large majority of borrowers have none. The variable tracks the number of credit lines a borrower is behind of payments for at the time of the new loan issue.
When an application does mention having outstanding delinquencies, it’s no surprise there are more at lower credit scores, and the loan status coloring shows they led to more charged off and defaulted loans between scores of 500 to 650.
The next plot explores credit usage. Specifically, credit lines opened in last 7 years are summed, which should be a good measure of recent credit activity.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2.00 17.00 25.00 26.77 35.00 136.00 46
The histogram shows a pretty normal distribution with a small tail of outliers containing a lot of recent credit activity.
Higher credit scores, especially around the mean at 700, seem to trend towards having opened more credit lines. Intuitively this makes sense, for it should be easier to receive a good credit offer at higher FICO scores leading to more credit agreements.
The final variable for exploration is the length of current employment for borrowers at the time the loan was issued.
The distribution is skewed right towards smaller tenures. A possible explanation is people who have worked for fewer years have less built up capital and would have the need for a loan.
There seems to again be the highest duration of employment around the FICO mean of 700, which represents a working class citizen with a stable job and good credit.
The three plots above show interesting variables that to a degree get factored into a final credit score, but at the end of the day what a borrower really is concerned with is the APR of their loan.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00653 0.15630 0.20980 0.21880 0.28380 0.51230 25
The full spectrum of rates above display a normal distribution and a mean APR of 21.88%.
In the end, interest rates are only as good as the borrower’s ability to pay back the loan.
Current loans in the set are ones still active as of 2014, so a good portion of our set is recent. In all, almost 40,000 loans have been completed in Prosper’s young history.
Taking this all into consideration, the plot below of all APRs and accompanying credit scores of Prosper loans seems confusing as shown. Too many currently unfinished loans are included. There also does not seem to be an overwhelming large correlation between credit score and APR to the naked eye.
Therefore, we need to look at all loans that are closed (either defaulted, charged off, or paid off). The below graph is a bit more clear. Fully paid off loans seem to have a negative correlation, and loans that were not fulfilled more heavily congregate lower on the credit scale and higher on the APR scale.
##
## Pearson's product-moment correlation
##
## data: loan.nocurrent$CreditScoreRangeUpper and loan.nocurrent$BorrowerAPR
## t = -113.1, df = 54359, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4432444 -0.4296344
## sample estimates:
## cor
## -0.4364644
As suspected, there’s a moderate negative correlation. In other words, APR decreases as credit score increases for the Prosper loans that were completed or defaulted, but not quite as strong as I had anticipated.
With a pretty good grasp of what the credit score variable ranges and implications look like, it’s necessary to turn to Prosper’s own method of grading the reliability of borrowers. When a loan is applied for, a Prosper Score is assigned to the potential borrower’s account. The score is based on past historical performance of Prosper loans to borrowers with similar characteristics according to the company. The metric was created in 2009, so all loans in the set from this point on feature the statistic.
The summary and histogram below show the distribution of Prosper scores for users. The distribution appears to be fairly normal with a minimum of 1 and maximum of 11.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 4.00 6.00 5.95 8.00 11.00 29084
How does it compare to credit scores? One would expect the two to be pretty strongly correlated since they both represent measures of a person’s ability to pay back a loan. The scatterplot does seem to show a positive correlation, but just how strong?
##
## Pearson's product-moment correlation
##
## data: loan.realcscore$ProsperScore and loan.realcscore$CreditScoreRangeUpper
## t = 115.87, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3637793 0.3753979
## sample estimates:
## cor
## 0.369603
Pearson’s R confirms positive correlation but really not quite as strong as one might believe. However, from this we can conclude that Prosper takes into account more than just credit score when grading its borrowers.
Now let’s take a peak at a few metrics from the data set concerning the investor side. Two metrics on percentages of money returned to investors are lender yield and estimated return.
As you can see, the lender yield variable (interest rate less servicing fees) carries a larger return number. The reason is it purely considers the rate charged to the borrower, where estimated return takes into account their ability to pay it back.
We first plot lender yields and Prosper Scores for each loan.
A quick look at lender yield shows that investors generally can expect a higher yield on low Prosper Score users in a perfect world where all loans are repaid. They however aren’t, and Prosper attempts to predict that in the next plot.
This plot of estimated return represents a way for Prosper to better predict what will be returned to the lender, taking into account estimated principal amount loss of a charge off, uncollected interest payments, and collected late fees. This is why the return at lower Prosper Scores is much lower than the previous lender yield above and even negative in some cases.
The median lines show loan statuses along each Prosper Score, and the completed loan had a lower return as they were presumably safer bets to start with.
Building on this, we next take a look at estimated return means by Prosper Score to see how Prosper predicted returns based on their own risk measure.
It seems interesting that the expected return does not necessarily go down until arount a Prosper Score of 6 then steadily decreases to 5%. It shows Prosper’s models don’t predict borrowers with scores between 1 and 6 to be any riskier than the next given they start with borrowing rates fitting for their score..
After looking at two variables in credit score and Prosper Score that are important to lenders, it’s time to get a deeper understanding of the most important metric on a loan for the consumer: annual percentage rate (APR).
The plot above is a modified histogram of the same borrower APR data that was previously plotted last section. The mean APR of all loans hover around 0.2. The histogram shape is fairly normal as expected. However, this time the bindwidths were scrutinzed, and a large spike between 0.3 and 0.4 is discovered.
To see if it has any relation to the status of the loans, the same APRs were plotted but separated by loan status.
The spike is pretty consistent throughout the status type, so the next step is to actually determine where this occurs and what it represents.
By manipulating the x-axis to spotlight the area, it’s found the count of over 3,000 loans occurs between APRs of 0.357 and 0.358.
## [1] 3672
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.358 0.358 0.358 0.358 0.358 0.358
After subsetting to include only these, the outlier section produced 3,672 loans of the same borrower APR (0.35797).
Furthur investigation below shows they are all small loans with the lowest Prosper Rating issued in 2012. For the purpose of getting measurements on the loan ratings, their numeric rating is used fairly extensively in the study. For reference, the loans rated 7 down to 1 (7 being best, 1 being worst) correspond to the letter ratings AA, A, B, C, D, E, and HR.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2000 3000 4000 3530 4000 4000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 1 1 1 1 1
##
## 2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1
## 112 779 951 1093 720 17
Upon furthur review of Prosper’s business history, beginning in 2010 Prosper changed it’s business model to the use of pre-set rates for loans based on their own formula for the borrower’s risk. Previously, from 2006-2009 the company used a variable rate model employing an Ebay-esque system where borrowers and lenders determined rates through Dutch auction style bidding. Therefore, the huge number of the same, very specific APR makes a little more sense occurring at this time with pre-set rates.
Looking further into 2012’s data, it’s also surprising the number of lowest rated loans in proportion to the rest issued that year. Below are two graphs: the first are loans issued in 2012 and the second is loans issued in all other years between 2009 and 2014.
You have to be surprised by the number of 1 Rated. We next look at the rest of the data to compare.
Comparing these, Prosper approved more lowest rated loans in 2012 than all other years combined. Since they had very few loans issued in 2009 from the reboot and the company would not release full 2014 data to the public, the following chart looks at the years inbetween.
Here we notice the number of loans growing and the distribution of loans for each Prosper rating change through the years. A in-depth look at this graph follows in the Final Plots section. However, before we move on from the topic, it’s worth showing that the size of the loan largely does not matter when it comes to actually paying back the loan or defaulting. The delinquent bar does show some variation from the other three, but it’s the most fluid since it holds the recent active loans, not accounts closed like the other 3.
It’s become obvious at this point that the date the loan was issued matters a great deal. An investor when deciding to fund a loan would ultimately want to know what they can expect back, so below is plotted the median estimated return over time.
In 2009 and 2014, estimated returns were similar, but significant variation shows in between as the company recovers in a volatile economic climate and adapts to a new pricing strategy. Overall, one could expect higher returns on smaller loans, confirmed by the negative correlation in Pearson’s R.
##
## Pearson's product-moment correlation
##
## data: loan$EstimatedReturn and loan$LoanOriginalAmount
## t = -86.98, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2922833 -0.2799279
## sample estimates:
## cor
## -0.2861175
Since estimated return and borrower APR are related, one would expect APR over the years to also have a similar shape.
The shape does resemble the plots above, but there’s a lot more to be gleaned here. There’s a detailed explanation of the graph to follow in the Final Plots section.
So far we see number of loans issued increase and average APR decrease by 2014 at Prosper, but what about the size of the loans issued?
A pretty staggering increase in loan size appears for a company originally built on peer to peer lending.
So now decreasing APR, more loans, and more big money flowing into Prosper loans. Why? Let’s look at the number of investors that fund each loan.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 44.00 80.48 115.00 1189.00
This is quite a large spread for the number of investors in a single loan, but one would expect investors per loan to be increasing over time to be able to fund all these larger loans.
Instead, the average investors is decreasing. How is this so? According to a 2014 article by Nav Athwal in Forbes Magazine, big banks and institutions, the very structures peer to peer lending was created to cut out of the process, began to flood the market. With capital far greater than a typical individual investor, these outfits raced in and “the peer lender is now the faceless institution with little motivation other than monetary return on investment”.
With a good understanding of the new methodology of Prosper in tow, we finish with a glimpse back at data from during the United States economic recession and loan crisis of the 2000s.
To do this, we’ll be looking at defaults. The first plot here shows default rates at Prosper throughout the entirety of the data set.
Here completion rate of loans are shown, but this is misleading for a few reasons. We ultimately want default rates, and also Prosper was not in business in quarter 1 of 2009. We only know the true ratio of finished loans for those up to 2009, because the totality of loans in years after may not be at their finish date (i.e. a 5 year loan issued in 2010 is not finished to know its outcome by the end of the data set in 2014 quarter 1).
Therefore, it was determined best to plot loans issued just prior to Prosper closing its doors in 2008. Also, to get a full grasp on the struggle borrowers had in repaying their loans, each account that was defaulted, chargedoff, or currently delinquent was included as being in some degree of loan delinquency.
The company opened in 2006, so it’s no surprise that volume increased each quarter but slowed to a halt before and after they shut down. A larger proportion of defaults on loans issued in 2006 fits with the idea that Americans had trouble paying these off when the economic downturn occurred starting 2007. 2008 saw the most loans issued on this graphic, but a lower default rate than previously just means that people who could actually afford to repay the loan, unlike the irresponsible loan recipients that caused the subprime loan disaster, were the ones taking them out.
How did each state fair in these tough environments? The following graph is a detailed look at default and delinquency rates of borrowers for all states until 2009.
A red tinge is more consistently present for states out west, and Texas and Alabama especially had trouble repaying their loans during this time period compared to the rest of the country.
It’s no surprise that lower rates appear prior to 2008, as it was a time when loans were issued at lower rates to people who ended up not being able to afford them. This ultimately culminated at the largest economic scale to the mortgage loan crisis beginning in 2007.
Once Prosper reopened its doors with SEC regulations, it appears they averaged a higher APR for borrowers. This coincides with the increase in volume of lower rated loans previously plotted. When the company switched the pricing system December 19,2010, it only took a few quarters for the company’s rates to drop significantly. Again, this goes hand in hand with the influx of instutional businesses in the peer to peer industry with both size and volume of loans increasing.
In my opinion, the loans issued per year colored by Prosper Rating is the most important and interesting plot from the study. It portrays the company’s business strategy and the direction the industry has gone following the SEC’s insistence that peer to peer lenders must be registered. The first and most obvious trend is the pure volume of loans increasing at the company. It speaks to the power of the company to overcome its hardships and also to the relative strength of the p2p industry.
The type of loans Prosper issued is the other glaring trend. In a new environment with stricter qualifications to borrow and a new pricing structure, the propotion of higher rated loans issued increased as Prosper gained traction again. As previously mentioned, theres not doubt that the institutional influence was strong in creating this.
The western United states seemed to struggle more, for between 20-30% of the loans from a majority of these states were in some level of delinquency. Texas was the only state to break the 30% threshold. The northeast faired relatively well other than New Hampshire and Rhode Island, and it should be mentioned that no loans we issued in South Dakota. In all, the plot shows that most all of the nation experienced some hardships during the economic downturn that spanned 2007-2010 with many Americans struggling to repay their loan.
The following table shows the six most affected states with specific ratios. Three highest ratios are from small states with few loans during the time period, which speaks to the relative newness of the company in these years.
## # A tibble: 6 × 4
## BorrowerState ratio n BorrowerStateFull
## <chr> <dbl> <int> <chr>
## 1 TX 0.3326111 923 texas
## 2 AL 0.3037249 349 alabama
## 3 RI 0.2777778 18 rhode island
## 4 MO 0.2725724 587 missouri
## 5 NH 0.2692308 78 new hampshire
## 6 DE 0.2592593 27 delaware
The Prosper data set that the company made public from 2006 to 2014 initially looks overwhelming with 113,937 individual loans and 82 variables to describe them. The first step was to try to understand what each variable contributed and why might Prosper require these in an application. There weren’t any huge questions that initially arose, so for the most part I decided I wanted to choose variables the general public can relate with and let these speak for themselves concerning their relation to the actual loan. These were variables that are prominent on television, in newspapers, and in financial circles that I felt would be beneficial for anyone reading the study to understand further both in general and specifically to the Prosper Marketplace.
Many of the successes and leads from the data set came from looking at the graphs in univariate or bivariate plots. Once a trend or interesting point was found in simple plots of the mainstream variables, it was simpler to explore how that might fit in with more vague or less consequential metrics. It was also beneficial to have some previous background information on the economics of the time period. Once I decided to group the data by year and quarter, trends became much more apparent.
With a data set this large and some confusing variables, problems unfortunately arose throughout. For one, the data set stopped in 2014, so many comparisons I wanted to make for default rates would have been inaccurate. The economic environment of Prosper as a company changed dramatically during the time frame provided. It shut its doors halfway through only to reopen under different regulations and even implemented a different business strategy. With a timeline like this, until a full understanding of the company was grasped it was very hard to discern what the trends were and why they were present. Drawing conclusions from the data as a whole proved misleading at times without taking into account whether the variable was directly impacted by regulations or business strategy.
There are many future follow-ups that would help explain some trends noticed and also to compare Prosper Marketplace to other lenders. Bank loan data, especially from 2006 to 2008, would be great to compare how institutions fared compared to Prosper. Similar to this, data from Lending Club, the other prominant peer to peer lending company in America, could provide some interesting conclusions to how successful the company is and how their business strategies might differ. A complete data set that stretches to present day for Prosper would help detail how the changes implemented after filing with the SEC have changed the structure of the company. Finally, a list of investors could help show what type of lenders use Propser and might firm up the theory of large instiutions flocking to this lending platform.